New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bounds checking #4432
Bounds checking #4432
Conversation
This is required to be able to raise exceptions, for instance, for bounds checking.
So far we need to: - Add tests - Make the error message match object mode - Add the flag to the public API (right now it is enabled by default for testing purposes)
numba/cgutils.py
Outdated
with if_unlikely(builder, out_of_bounds_upper): | ||
context.call_conv.return_user_exc(builder, IndexError, (msg,)) | ||
out_of_bounds_lower = builder.icmp_signed('<', ind, ind.type(0)) | ||
with if_unlikely(builder, out_of_bounds_lower): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I merge these two ifs? I didn't see how to do a boolean OR in the llvmlite documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You possibly want something like this:
out_of_bounds_upper = builder.icmp_signed('>=', ind, dimlen)
out_of_bounds_lower = builder.icmp_signed('<', ind, ind.type(0))
predicate = builder.or_(out_of_bounds_upper, out_of_bounds_lower)
with if_unlikely(builder, predicate):
context.call_conv.return_user_exc(builder, IndexError, (msg,))
Docs: http://llvmlite.pydata.org/en/latest/user-guide/ir/ir-builder.html#llvmlite.ir.IRBuilder.or_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says or_
is bitwise OR (like |
instead of ||
). I don't know if that matters here. I mainly want to make sure I generate code that LLVM optimizes correctly (how can I check that?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed bitwise or
, suggested that for same net effect, though it will likely generate an or
op whereas a logical or
would likely generate jump(s). To get something logical or
-like (i.e. ||
) it's is a bit like what you have already but the branches need nesting so that the second condition is only checked if the first passes, I'm assuming that's what you are thinking of?
You can check the LLVM IR with NUMBA_DUMP_LLVM=1
as an env var, or the .inspect_llvm
method on the dispatcher. The assembly can be seen with the .inspect_asm
method on the dispatcher. How were you hoping it to optimise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if the index is already known to be less than the dimension or nonnegative it should optimize out the if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the LLVM correctly (and I'm not very good at reading LLVM, so I could be wrong), it saves the IndexError as a predefined constant. So that adds a bit of overhead to every function call (though it looks like it does it for every exception, so maybe that's not my problem to solve here). Also it implies that it would be more efficient to have only one
return_user_exc
here.
I'm not sure how this adds overhead to a function call, any chance you could please expand on this? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think under very specific conditions it might be possible to do this, but all the information would likely have to be constant at compile time. Also, any if from the pattern being added here will likely create a branch in a tight indexing loop and prevent vectorization.
So if I have something like
A = 0
for i in range(x.shape[0]):
A += x[i]
it won't be able to deduce that 0 <= i < x.shape[0]
and so the bounds checking can be optimized away?
I'm not sure how this adds overhead to a function call, any chance you could please expand on this? Thanks.
I think I misread the LLVM. The exceptions are saved as constant pickles on the functions, but are only unpickled when needed. So it adds a memory overhead to the function definition, but that's it. It's probably better to not have the exception there twice but maybe not a huge deal.
More importantly, since the exceptions are precomputed, I don't see how to raise an exception with a dynamic message, namely including the dimension in the error message like NumPy does. Would we need to add support for pickling a function that returns the exception with the given message? Or is this already something that can be done? If it's not easy to do this, I might skip it for this PR, if that's alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think under very specific conditions it might be possible to do this, but all the information would likely have to be constant at compile time. Also, any if from the pattern being added here will likely create a branch in a tight indexing loop and prevent vectorization.
So if I have something like
A = 0 for i in range(x.shape[0]): A += x[i]it won't be able to deduce that
0 <= i < x.shape[0]
and so the bounds checking can be optimized away?
In theory, yes, but it depends on what code gets generated. I think that if an unsigned comparison operator were used in the bounds check then LLVM is more likely to be able to work it out, if a signed comparison is done then it won't. I just generated the assembly for:
- no boundscheck
- boundscheck with signed comparison
- boundscheck with unsigned comparison
Cases 1. and 3.are reasonably similar but 1. has a deeper loop unroll. 2. has no vectorisation because there's a test/jump in the middle of the loop.
I think more can be done in terms of identifying unsigned ranges and suspect that this is likely to help.
I'm not sure how this adds overhead to a function call, any chance you could please expand on this? Thanks.
I think I misread the LLVM. The exceptions are saved as constant pickles on the functions, but are only unpickled when needed. So it adds a memory overhead to the function definition, but that's it. It's probably better to not have the exception there twice but maybe not a huge deal.
More importantly, since the exceptions are precomputed, I don't see how to raise an exception with a dynamic message, namely including the dimension in the error message like NumPy does. Would we need to add support for pickling a function that returns the exception with the given message? Or is this already something that can be done? If it's not easy to do this, I might skip it for this PR, if that's alright.
Numba stores exception strings as compile time constants so you cannot raise an exception with a dynamically generated string. So yes, skipping it is fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all range(x)
calls unsigned in #4446 and was reminded of the amount of work needed to implement this correctly due to the NumPy casting rule (generalizing as):
uint* <op> int* -> float
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numba stores exception strings as compile time constants so you cannot raise an exception with a dynamically generated string. So yes, skipping it is fine :)
I think you could serialize a function that creates the IndexError rather than the IndexError itself. Then modify return_user_exc
so it can call the function to create the exception before raising it.
The thing I'm not sure about are the builder instructions for converting the index, shape, and axis into strings so they can be put in the message.
Agree, |
I guess this isn't accounting for the numpy shape: >>> import numba
>>> import numpy
>>> @numba.jit
... def test(x):
... return x[:, 0]
>>> test(numpy.empty((0, 1)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index is out of bounds This is the source of the test failures. Is the numpy shape saved somewhere? If not, I guess we can just disable bounds checks if one of the shapes is 0. We can't get segfaults or nonsense memory values in that case. It mainly would help for type checking. |
The shape is an argument to the import numba
import numpy
@numba.jit
def test(x):
big_index = 10000000000000000
return x[big_index, big_index]
print(test(numpy.empty((0, 1)))) will likely segfault. This little snippet might help you with debugging, it lets you print an iterable from the compiled code: def print_iterable(builder, name, fmt, iterable):
printf(builder, "{}\n".format(name))
for x in iterable:
printf(builder, fmt, x)
printf(builder, "\n")
print_iterable(builder, "shape", "%d, ", shape)
print_iterable(builder, "indices", "%d, ", indices) if you paste that into the
To accommodate the builder.if_else(zero_d_predicate) as (then, otherwise):
with then:
<stuff>
with otherwise:
<stuff> might be useful? Hope this helps? |
Oh the problem isn't that it's removing the numpy shape as I was assuming. It's that bounds checking for an empty dimension shouldn't do the Also regarding the error thing, I hadn't realized that it reuses the pickled exception if it is used multiple times. I had seen it twice in there but only because I had different error messages for the upper and lower out of bounds at the time. So I was worried that you would get 2n pickled exceptions saved to the compiled function for n array accesses, but that isn't the case. Thanks for the debug suggestion. I didn't notice the |
Can we distinguish these two cases inside numba? >>> a = np.empty((0, 1))
>>> a[:, 0]
array([], dtype=float64)
>>> a[0, 0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 0 is out of bounds for axis 0 with size 0 The above |
This code does not segfault for me (in numba master) from numba import njit
@njit
def test(x):
return x[:, 10000000000000000000]
import numpy as np
a = np.empty((0, 1))
test(a) |
I think what we have to do is to implement separate bounds checking in I don't actually understand how numpy handles shape-0 arrays. Does it just create a placeholder pointer? |
In NumPy with a = np.empty((0, 1)), a[0, 0] raises IndexError, but a[:, 0] does not. It is impossible to distinguish these two cases in get_item_pointer, which both want a pointer to the "start" of the empty array. So I have factored out the bounds checking into a helper function, and called it separately in basic_indexing() (the same as is already done with wraparound correction). That way, we turn of bounds checking for slice indices on shape 0 axes. Note that slice indices are already not bounds checked from above, e.g., a[0:100] works even if a has fewer than 100 elements. It works this way even on Python lists.
I've implemented that in the latest commit. |
Python/NumPy doesn't do any bounds checking on slices.
This is only temporary, until we can include it in the error message proper. I have enabled it when NUMBA_FULL_TRACEBACKS is set (I don't want to add a new debug flag for it since it should hopefully be removed soon).
By the time we get to GEP I think it's probably too late, the contextual slice info is lost. I think this is because a slice looks like accessing index 0, i.e. "where does the data start for a slice == what is index 0". This https://github.com/numba/numba/pull/4432/files#diff-36c9ab8fa31f756e0da52905db0a95deR717-R722 is just working out where in data block the relevant data is. |
I figured out that issue (see my most recent comments/commits). Right now I'm trying to figure out the test failures, which are presently failing because bounds checking is still enabled by default. It looks like there are some issues with both cuda and parallel. Here's an example: from numba import njit
import numpy as np
@njit(parallel=True)
def test(X):
X[:0] += 1
return X
test(np.ones(10)) Enabling the debug output which I just added, you get
I guess parallel takes slices and converts them into explicit indices. But bounds checking should not be enabled for slices, so I need to figure out where that is happening and how to disable it for that code path, similar to what I did for the single core slice code path. I haven't looked at the cuda issues yet, but I think they are similar. As an aside, it could be a good idea to have a CI build that runs the test suite with bounds checking enabled. I already found one function in numba that failed with it enabled (see 0b95b7c). |
Were I to guess, this is where I'd start looking. Lines 2214 to 2220 in f629736
If we have the compute capacity then I think this makes sense. |
Note to self: need to find a way to get an |
@DrTodd13 This is the issue with parfors and bounds check, if you have any ideas they'd be appreciated, thanks #4432 (comment) ! |
The bounds check should only happen when the getitem is physically called, unless I am misunderstanding how things work. Is the parfor generated from the slice doing unnecessary redundant getitems? I wasn't able yet to debug why it happens (any suggestions for how to debug this sort of thing, beyond "learn how to read LLVM"?) And if it is and they're not easy to remove, is there some way to annotate the getitem so that the get_item_pointer can know it shouldn't be bounds checked? The parfor logic happens in a completely different stage from the get_item_pointer. |
The parfor slice thing is the only issue left here that I'm aware of. I still need to run the full test suite with bounds checking enable to make sure there aren't other failures I missed. As an aside, one test in the parfor tests confirmed that bounds checking does indeed break vectorization. |
I just noticed that the When testing this, the tests should be run with |
numba/cgutils.py
Outdated
|
||
def do_boundscheck(context, builder, ind, dimlen, axis=None): | ||
# Boundschecking is always disabled for CUDA | ||
from .cuda.target import CUDATargetContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd. Can we rely on turning off boundchecking at the numba/cuda/compiler.py
.
And it is likely causing other errors because the cuda
subpackage is auto-initializing the CUDA Driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I have it this way is because I have it so that the environment variable overrides the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import is causing error like: https://dev.azure.com/numba/numba/_build/results?buildId=3349&view=logs&jobId=eb149424-352d-52d9-b16d-aeffaad6858a&taskId=cdac5993-002e-5764-1ca7-e646ebe78687&lineStart=1728&lineEnd=1760&colStart=1&colEnd=99
(^ the link highlights in the log but doesn't seem to scroll to it)
Traceback (most recent call last):
File "/home/vsts/work/1/s/numba/errors.py", line 717, in new_error_context
yield
File "/home/vsts/work/1/s/numba/lowering.py", line 260, in lower_block
self.lower_inst(inst)
File "/home/vsts/work/1/s/numba/lowering.py", line 303, in lower_inst
val = self.lower_assign(ty, inst)
File "/home/vsts/work/1/s/numba/lowering.py", line 467, in lower_assign
return self.lower_expr(ty, value)
File "/home/vsts/work/1/s/numba/lowering.py", line 1055, in lower_expr
expr.index_var, signature)
File "/home/vsts/work/1/s/numba/lowering.py", line 612, in lower_getitem
res = impl(self.builder, castvals)
File "/home/vsts/work/1/s/numba/targets/base.py", line 1132, in __call__
res = self._imp(self._context, builder, self._sig, args, loc=loc)
File "/home/vsts/work/1/s/numba/targets/base.py", line 1157, in wrapper
return fn(*args, **kwargs)
File "/home/vsts/work/1/s/numba/targets/arrayobj.py", line 429, in getitem_arraynd_intp
aryty, ary, (idxty,), (idx,))
File "/home/vsts/work/1/s/numba/targets/arrayobj.py", line 403, in _getitem_array_generic
basic_indexing(context, builder, aryty, ary, index_types, indices)
File "/home/vsts/work/1/s/numba/targets/arrayobj.py", line 359, in basic_indexing
cgutils.do_boundscheck(context, builder, ind, shapes[ax], ax)
File "/home/vsts/work/1/s/numba/cgutils.py", line 674, in do_boundscheck
from .cuda.target import CUDATargetContext
File "/home/vsts/work/1/s/numba/cuda/target.py", line 17, in <module>
from . import codegen, nvvmutils
File "/home/vsts/work/1/s/numba/cuda/nvvmutils.py", line 5, in <module>
from .api import current_context
File "/home/vsts/work/1/s/numba/cuda/api.py", line 298, in <module>
event_elapsed_time = driver.event_elapsed_time
AttributeError: module 'numba.cuda.simulator.cudadrv.driver' has no attribute 'event_elapsed_time'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I didn't realize that the import could fail like that. What would be the best way to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some attribute to the context that lets us detect that it is CUDA without importing it? For instance, context.name
. Or I can just check type(context).__name__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function should do the check at all. If we turn off boundscheck
in cuda context (#4432 (comment)) and make sure the array implementation follow the context setting, we can avoid the check here.
Here's what I think the changes should look like:
https://gist.github.com/sklam/38a31d9a5e9124dab7a77def216e0a4c
(I haven't fully tested it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check it out. Actually this reminds me that I was going to add a test that the NUMBA_BOUNDSCHECK environment variable overrides the flag correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to modify it so that the NUMBA_BOUNDSCHECK
environment variable still works. I did this by making enable_boundscheck
a property on the context class, so that it can use the config.BOUNDSCHECK as a default and so it can be modified at runtime by the override_env_config
context manager for testing.
numba/cuda/compiler.py
Outdated
@@ -37,7 +37,7 @@ def compile_cuda(pyfunc, return_type, args, debug, inline): | |||
flags.set('no_compile') | |||
flags.set('no_cpython_wrapper') | |||
if debug: | |||
flags.set('boundcheck') | |||
flags.set('boundscheck') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should let this be off and maybe check that the flag is never turned on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already raises an error if the flag is set (there is a test for this).
I've addressed the review comments here. |
This issue shouldn't block this pull request, but regarding getting the location of the exception, it seems that the most straightforward way is to store it on the diff --git a/numba/cgutils.py b/numba/cgutils.py
index cdce25131..c89a70565 100644
--- a/numba/cgutils.py
+++ b/numba/cgutils.py
@@ -691,12 +691,12 @@ def do_boundscheck(context, builder, ind, dimlen, axis=None):
with if_unlikely(builder, out_of_bounds_upper):
if config.FULL_TRACEBACKS:
_dbg()
- context.call_conv.return_user_exc(builder, IndexError, (msg,))
+ context.call_conv.return_user_exc(builder, IndexError, (msg,), loc=context.loc)
out_of_bounds_lower = builder.icmp_signed('<', ind, ind.type(0))
with if_unlikely(builder, out_of_bounds_lower):
if config.FULL_TRACEBACKS:
_dbg()
- context.call_conv.return_user_exc(builder, IndexError, (msg,))
+ context.call_conv.return_user_exc(builder, IndexError, (msg,), loc=context.loc)
def get_item_pointer2(context, builder, data, shape, strides, layout, inds,
diff --git a/numba/lowering.py b/numba/lowering.py
index 3fb48c03f..824f61208 100644
--- a/numba/lowering.py
+++ b/numba/lowering.py
@@ -254,6 +254,7 @@ class BaseLower(object):
self.pre_block(block)
for inst in block.body:
self.loc = inst.loc
+ self.context.loc = inst.loc
defaulterrcls = partial(LoweringError, loc=self.loc)
with new_error_context('lowering "{inst}" at {loc}', inst=inst,
loc=self.loc, errcls_=defaulterrcls):
diff --git a/numba/targets/base.py b/numba/targets/base.py
index a7b52d69e..c89b1e56c 100644
--- a/numba/targets/base.py
+++ b/numba/targets/base.py
@@ -254,6 +254,8 @@ class BaseContext(object):
self._boundscheck = False
+ self.loc = None
+
self.data_model_manager = datamodel.default_manager
# Initialize from numba import njit
import numpy as np
@njit(boundscheck=True)
def test(x):
return x[1]
test(np.array([1])) $ python test-loc.py
Traceback (most recent call last):
File "test-loc.py", line 6, in <module>
test(np.array([1]))
File "test-loc.py", line 5, in test
return x[1]
IndexError: index is out of bounds |
Hopefully this fill fix the test failures on Windows.
I ran the tests with this and there were no failures. Should I push it up here? |
RE: #4432 (comment) |
Will do. I also have some thoughts on getting the error message to include a more complete error message (which I haven't implemented yet). I'll start a new PR once this one is merged. |
running internal CI: |
this has passed internal CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This feature will help user avoid segfaults.
This is still a work in progress, but feedback is welcome. I'm still new to LLVM so let me know if I should be generating the code in a better way.
I have changed the API of get_item_pointer[2] to include the context argument. This is required to raise an exception.
Still TODO:
testing purposes).
TODOs that should probably wait for a future PR:
NUMBA_FULL_TRACEBACK=1
is set, the index, shape, and axis are printed as a debug message.I'll need help on how to do the last item.
There is also a
boundcheck
flag in some places in the code, which doesn't seem to do anything. I have named my flagboundscheck
with ans
, as that seemed better, but I don't particularly care if you decide another name is better.